The performance of modularity maximization in practical contexts 
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Although widely used in practice, the behavior and accuracy of the popular module identification 
technique called modularity maximization is not well understood in practical contexts. Here, we 
present a broad characterization of its performance in such situations. First, we revisit and clarify 
the resolution limit phenomenon for modularity maximization. Second, we show that the modularity 
function Q exhibits extreme degeneracies: it typically admits an exponential number of distinct high- 
scoring solutions and typically lacks a clear global maximum. Third, we derive the limiting behavior 
of the maximum modularity Qmax for one model of infinitely modular networks, showing that it 
depends strongly both on the size of the network and on the number of modules it contains. Finally, 
using three real-world metabolic networks as examples, we show that the degenerate solutions can 
fundamentally disagree on many, but not all, partition properties such as the composition of the 
largest modules and the distribution of module sizes. These results imply that the output of any 
modularity maximization procedure should be interpreted cautiously in scientific contexts. They 
also explain why many heuristics are often successful at finding high-scoring partitions in practice 
and why different heuristics can disagree on the modular structure of the same network. We conclude 
by discussing avenues for mitigating some of these behaviors, such as combining information from 
many degenerate solutions or using generative models. 



I. INTRODUCTION 



Networks are a powerful tool for understanding the 
structure, dynamics, robustness and evolution of complex 
biological, technological and social systems pQ [5] ■ The 
automatic detection of modular structures in networks — 
also called communities 3J or compartments [4] , and con- 
ventionally understood to be large subgraphs with high 
internal densities — can provide a scalable way to identify 
functionally important or closely related classes of nodes 
from interaction data alone O [6] . 

The identification of modular structures has broad im- 
plications for many systems-level questions. For instance, 
it has strong consequences for the behavior of dynamical 
processes on networks [7j |8] , and can provide a principled 
way to reduce or coarse-grain a system by dividing global 
heterogeneity into relatively homogeneous substructures. 
The search for such modular substructures has been par- 
ticularly intense in molecular networks. This is, in part, 
because modules have theoretical significance for molec- 
ular networks [51fTT|: they can correspond to functional 
clusters of genes or proteins O [T3] , they may represent 
targets of natural selection above the level of individual 
genes or proteins but below the level of the whole organ- 
ism, and they may provide evidence of past evolution- 
ary constraints or pressures [131 [H]. Past work along 
these lines has identified modular structures in signal- 
ing, metabolic and protein- interaction systems |14H17j . 
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although some questions remain about their statistical 
significance flB] and functional relevance [TH] . Naturally, 
many of these questions apply equally well to modules in 
social and technological networks. 

Empirical evidence for a modular organization is typ- 
ically derived using computer algorithms that automat- 
ically identify modules using network connectivity data, 
and among the many techniques now available (see 
Refs. [SI [50] for reviews), the method of modularity 
maximization [3] is by far the most popular. Under this 
method, each decomposition or partition of a network 
into k disjoint modules is given a score Q, called the 
modularity or sometimes "Newman- Girvan modularity" : 
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where is the number of edges in module z, di is the total 
degree of nodes in module i and m is the total number of 
edges in the network. Intuitively, Q measures the differ- 
ence between the observed connectivity within modules 
and its expected value for a random graph with the same 
degree sequence [5T]. Thus, a "good" partition — with Q 
closer to unity — identifies groups with many more inter- 
nal connections than expected at random; in contrast, a 
"bad" partition — with Q closer to zero — identifies groups 
with no more internal connections than we expect at ran- 
dom. This reasonable formulation recasts the problem of 
identifying modules as a problem of finding the so-called 
optimal partition, i.e., the partition that maximizes the 
modularity function Q. 

Despite the popularity of modularity maximization, 
much remains unknown about the quality and signifi- 
cance of its output when applied to real-world networks 
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with unknown modular structure. Most past work has 
focused on developing new ways of detecting modules, 
rather than on characterizing their performance in prac- 
tical situations. In general, maximizing Q is known to be 
NP-hard ^22j, but many heuristic approaches — including 
greedy agglomeration [23l - f25] . mathematical program- 
ming [211 , spectral methods [371 HHI : extremal optimiza- 
tion 29], simulated annealing [Uj and sampling tech- 
niques [ini ED — perform well on simple synthetic net- 
works with strong modular structure [31 and they often 
succeed at finding high-modularity partitions in practice. 
The apparent success of these methods has led to their 
widespread adoption, and often (but not always [SHIH]) 
the implicit acceptance of several assumptions: (i) em- 
pirical networks with modular structure tend to exhibit a 
clear optimal partition |181 130j , (ii) high-modularity par- 
titions of an empirical network are structurally similar to 
this optimal partition and (iii) the estimated Qmax can 
be meaningfully compared across networks |33| . 

Here, we present a broad characterization of the be- 
havior of modularity maximization in practical contexts. 
First, we revisit and clarify the resolution limit phe- 
nomenon [321 [53H3S]- We then show that the above 
assumptions do not hold when modularity maximiza- 
tion is applied to networks with modular or hierarchi- 
cal structure. Using a combination of analytic and nu- 
merical techniques (whose implementations are available 
online^), we show that the modularity function Q ex- 
hibits extreme degeneracies such that the globally max- 
imum modularity partition is typically hidden among 
an exponential number of structurally dissimilar, high- 
modularity solutions. We then derive the asymptotic 
behavior of Qmax in the limit of infinitely modular net- 
works, showing that it depends strongly on both the size 
of the network and on the number of modules it contains. 
Finally, using three real-world examples of metabolic net- 
works, we show that the degenerate solutions can funda- 
mentally disagree on many (but not all) partition prop- 
erties such as the composition of the largest identified 
modules and the distribution of module sizes. This latter 
finding poses a serious problem for scientific applications. 

Together, these results significantly extend our under- 
standing of the behavior and results of modularity max- 
imization in practical contexts. When applied to net- 
works with modular structure, these results imply that 
any particular partition of a real-world network found by 
maximizing modularity should be interpreted cautiously. 
In principle, there is nothing special about the modular- 
ity function with respect to the degeneracy result and we 
expect that other module identification techniques will 
exhibit similar behavior. We conclude by discussing the 
prospects of ameliorating these issues, for instance, by 
combining information across degenerate solutions or us- 
ing generative models. 



II. THE RESOLUTION LIMIT REVISITED 

Recently, Fortunato and Barthelemy showed that mod- 
ularity admits an implicit resolution limit 32 , in which 
the maximum modularity partition can fail to resolve 
modules smaller than a size, or weight [35] , that depends 
on the total weight of edges in the network m. This vi- 
olates the notion that the quality of a module should be 
a local property. As a result, intuitively modular struc- 
tures, such as cliques of moderate size, can be hidden 
within large agglomerations that yield a higher modular- 
ity score, and the peak of the modularity function (the 
optimal partition) may not coincide with the partition 
that correctly identifies these intuitive modules (the in- 
tuitive partition). 

This behavior is sometimes described as an implicit 
preference on edge weight within identified modules. But 
as we show here, the resolution limit is better understood 
as a consequence of assuming that inter-module connec- 
tions follow a random graph model, which induces an 
explicit preference on the weight of between-module con- 
nections. Thus, it should not be surprising that other 
partition score functions that make similar random-graph 
assumptions about inter-module edges, such as Potts- 
models [35 and several likelihood-based [35] techniques, 
also exhibit resolution limits. 

To begin, we consider the change in modularity AQ 
obtained by merging two modules in the intuitive par- 
tition. If this change is positive, then the modularity 
function will fail to resolve the intuitive modules, since 
a higher modularity score is achieved by merging them. 
Let Ci and be the number of edges within the mod- 
ules and Cij be the number of edges between them. The 
change in Q for merging them |32j is 

which is positive whenever 



That is, independent of the modules' internal structure, 
a merger is beneficial whenever the observed number of 
inter-module edges Cij exceeds the number expected for 
a random graph with the same degree sequence {cij) — 
did j /2m [6]. This behavior is particularly problematic for 
large unweighted networks because modularity tends to 
expect (eij) < 1 while the minimum inter-module weight 
is Cij = 1, i.e., a single edge. On the other hand, as we 
show below, weighted networks whose inter-module con- 
nectivity approximates the null expectation can escape 
the resolution limit completely. 

A. Two examples 
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To illustrate the subtlety of this behavior, we consider 
two versions of the ring network model [3 2) , in which k 
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That is, we completely connect the cliques using edges 
with weight Cij = 2/{k — 1). Notably, the total weight 
of a module here is exactly the same as in the example 
above, that is, di — + 2, and the total weight in the 
network grows with k. But, the penalty for merging a 
pair of connected cliques in this network is given by 



FIG. 1: (color online) A schematic of a ring network with k = 
24 cliques of c = 5 nodes each (shaded circles) joined by single 
links to form a ring. The intuitive partition, which places 
individual cliques on their own, has modularity Qi = 0.8674, 
while the optimal partition (the 2-cIique tiles), which merges 
adjacent cliques, has slightly larger modularity Q2 = 0.8712. 



cliques, each containing c nodes, are connected by single 
edges to form a ring (Fig.[T]). The intuitive partition here 
places each clique in a group by itself and, at least for 
small values of k, this is also the optimal partition. The 
penalty for merging a pair of adjacent cliques is given by 
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AQ = 



which is positive whenever 
k>2 



-2k-' , (4) 
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Thus, there is some number of cliques k* above which the 
modularity function gives a higher score to a partition 
that merges pairs of adjacent cliques. 

[We note that this argument generalizes to merging £ 
adjacent cliques: an ^-merged partition has greater mod- 
ularity than an {£ — l)-merged partition whenever 



k >£{£-!) 



1 



(6) 



Thus, the resolution limit is multi-scale and for large con- 
nected networks, intuitively modular structures can be 
hidden within very large agglomerations.] 

The resolution limit appears in the ring network be- 
cause each module is connected to its nearest neighbors 
with a constant weight e^t = 0(1), while the null model 
expects this weight to decrease with k. Thus, there must 
be some size of the network, i.e., a value of k, above which 
even a single unweighted edge between two cliques will 
appear "surprising" under the null model, and modular- 
ity will favor merging these minimally connected mod- 
ules. 

Some kinds of weighted networks, however, do not ex- 
hibit a resolution limit. For instance, consider a weighted 
variation of the ring network, composed of k cliques, 
each containing c nodes and each with internal weight 
Bi = (2). Now, instead of connecting each clique to two 
others to form a ring, we take the same total weight and 
spread it evenly across connections to every other clique. 



which is negative for all k > 2. Thus, it is never beneficial 
to merge a pair of cliques and there is no resolution limit 
in this network. 

Surprisingly, despite the fact that the internal and ex- 
ternal module weights in both of these toy networks are 
exactly the same, one exhibits a resolution limit while 
the other does not. The crucial difference between these 
examples is that in one the weight of an inter-module con- 
nection is independent of the size of the network, while 
in the other, it decreases. This dependence allows the 
observed inter-module connectivity between any pair of 
modules to closely follow the connectivity expected un- 
der the null model, and to avoid the condition given by 
Eq. ^ for merging two modules, even though the total 
weight in the network grows without bound. 

Thus, we see that the resolution limit is better ex- 
plained as a systematic deviation between the inter- 
module connectivity Cij and the connectivity expected 
under the random-graph null model (cij) = didi/2m, 
than as an implicit preference on the weight of edges 
within modides. 



B. A broader perspective 

For unweighted networks, and for many weighted ones, 
Fortunato and Barthelemy correctly argue that the res- 
olution limit poses a very real problem for the direct in- 
terpretation of the optimal partition's composition. 

On the other hand, since the intuitive partition is al- 
ways a refinement of the optimal partition, cleverly de- 
signed algorithms may be able to circumvent the resolu- 
tion limit in some cases. For instance, divisive algorithms 
that recursively partition large modules [27l |28l [37] while 
progressively lowering the resolution limit may be able to 
find the appropriate refinement (although some problems 
can remain if the divisions are always binary |28j). Al- 
ternatively, the history of merges within agglomerative 
algorithms may provide a way to identify the intuitive 
modules that were merged to obtain the optimal parti- 
tion [231 ES] . Multi-scale modularity-based methods [551 - 
|40] allow a researcher to specify a target resolution limit 
and identify modules on that scale, but it is typically not 
clear how to choose the "correct" target resolution a pri- 
ori. Finally, Berry et al. |36j recently showed that in some 
situations, the resolution limit can be circumvented with 
a clever edge-weighting scheme. These possibilities are 
encouraging, but most have yet to be fully characterized. 
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More generally, this discussion of intuitive versus opti- 
mal partitions ignores two subtle problems in the general 
task of identifying modules from connectivity data alone. 
First, there is the choice of a random graph as the null 
model, which, as we showed above, plays a critical role in 
generating resolution limits. If we could instead choose a 
null model with more realistic assumptions about inter- 
module connectivity, unintuitive merges would become 
less likely. For instance, the null model assumes that 
an edge emerging from some module can, in principle, 
connect to any node in the network, but for real-world 
systems this assumption is rarely accurate (a point also 
recently made by Fortunato 

A related issue is that the null model is unforgiving of 
sampling fluctuations, even those naturally generated by 
the null model itself. That is, the null model is a mean- 
field assumption, while actual networks — even those 
drawn from the null model — naturally deviate from 
these expected values. Such fluctuations are ultimately 
responsible for the non-zero maximum modularity scores 
observed in homogeneous random graphs [4lj . This issue 
is more severe in sparse networks, where the expected 
inter-module connectivity will tend to be less than one 
edge, while sampling alone will generate a non-trivial 
number of such connections. Modularity will see these 
connections as "surprising" and may mistake them for 
structure internal to a module. The Berry et al. edge 
reweighting approach can serve to dampen this effect 
by reducing the relative weight of inter-module edges so 
that they appear closer to what we expect under the null 
model. A more tolerant definition of modularity might 
only merge groups of nodes if their observed intercon- 
nectivity were statistically significant relative to the null 
model (an approach hinted at by Refs. P^ l [5T | [57 1 V 

Second, and more fundamentally, in order to distin- 
guish an optimal partition from an intuitive partition, 
we must assume an external definition of an "intuitive" 
module. The fact that there exist instances where mod- 
ularity maximization produces counter-intuitive results, 
i.e., results that clash with our external definition, simply 
highlights the difficulty of constructing a mathematical 
definition of a module that always agrees with our intu- 
ition. Indeed, it is unknown whether our intuition is even 
internally consistent. 

Precisely the same difficulty lies at the heart of a 
decades-long and ongoing debate over how best to iden- 
tify "clusters" in spatial data, which are convention- 
ally understood to be non-trivial groups of points with 
high internal densities. For instance, Kleinberg recently 
proved that no mathematical definition of a spatial clus- 
ter can simultaneously satisfy three intuitive require- 
ments ^43j, while Ackerman and Ben-David, taking a 
different approach, arrived at a contradictory conclu- 
sion |44j . For identifying modules in complex networks, 
the debate has only just begun and it remains to be seen 
whether "impossibility" results from spatial clustering 
also apply to network clustering. 



III. EXTREME DEGENERACY AMONG 
HIGH-MODULARITY PARTITIONS 

If we take the mathematically principled stance and ac- 
cept modularity's definition of a good module, i.e., we do 
not assume any external notions, the modularity func- 
tion still admits a subtle and problematic behavior for 
practical optimization techniques: even when it is not 
beneficial to merge two modules, i.e., when AQy < 0, 
the penalty for doing so can be very small. Further, as 
the number of modular structures increases, the num- 
ber of ways to combine them in these suboptimal ways 
grows exponentially. Together, these properties lead to 
extreme degeneracies in the modularity function, which 
are problematic both for finding the maximum modu- 
larity partition and for interpreting the structure of any 
particular high-modularity partition. Thus, we have a 
highly counter-intuitive situation: as a network becomes 
more modular, the globally optimal partition becomes 
harder to find among the growing number of suboptimal, 
but competitive, alternatives. 



A. Modular networks 

To make this argument quantitative, consider a net- 
work composed of k sparsely interconnected groups of 
nodes with roughly equal densities di ~ 2m/ k. Even 
when TO is small enough that the intuitive partition co- 
incides with the optimal partition (i.e., when there is no 
clash between our intuition and the definition of mod- 
ularity), Eq. ([2]) shows that the change in modularity 
for merging a pair of these groups is bounded below 
by AQij = —2k~^. For a moderate choice of fc = 20, 
this change is at most AQij = —0.005, which implies 
that these alternative partitions have modularities very 
close to Qrnax- As the number of groups k increases, 
this penalty tends toward zero, and it becomes increas- 
ingly difficult for the modularity function to distinguish 
between the optimal partition and these suboptimal al- 
ternatives. 

If there were only a few competitive alternatives, this 
degeneracy problem might be manageable. Unfortu- 
nately, their number grows combinatorially with the 
number of modular structures k. Its precise behavior de- 
pends on the inter-module connectivity, but is bounded 
below by 2'^"^ and above by the kth Bell number. 

The lower bound can be seen by considering the con- 
nected modular network with the fewest inter-module 
edges. This is given by the "string" network, which is 
a ring network with one intcr-module edge removed. In 
this case, the number of suboptima is equal to the num- 
ber of ways we can cut inter-module edges to divide the 
k groups into connected components. Because there are 
k — 1 such edges, each of which can be either cut or not 
cut, the number of partitions we can obtain this way is 
exactly 2''~^. 

The upper bound comes from a network where each of 
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the k groups is connected to every other group, and the 
number of suboptima here is exactly equal to the number 
of ways to partition the k groups into k' groups of groups, 
for all choices of k' . This is given by the fcth Bell number, 
which grows faster than exponentially. 

The intermediate levels of degeneracy correspond to 
networks with varying degrees of inter-module connectiv- 
ity: sparse modular networks will be closer to the lower 
bound, while dense modular networks will be closer to 
the upper bound. 

At face value, this finding seems to contradict that of 
Massen and Doye |2D], who argued that empirical net- 
works with modular structure tend to exhibit a strong 
global peak around the optimal partition. This is a red 
herring. Recall that the total number of partitions of n 
nodes grows like the nth Bell number, while the fraction 
of these that correspond to degenerate solutions is van- 
ishingly small, since k < n. Thus, the modularity func- 
tion is strongly peaked in a relative sense: even a super- 
exponential number of degenerate, high-modularity so- 
lutions can still be a vanishingly small fraction of the 
enormous number of partitions in general. 

B. Hierarchical networks 

In addition to modular structure, many networks ex- 
hibit hierarchical structure, in which their nodes divide 
into groups that further subdivide into groups of groups, 
etc. over multiple scales [M l [3T 1 [45H47j . and where groups 
that are closer together in this hierarchy tend to be more 
densely interconnected. Here, we show that such net- 
works exhibit at least as many degenerate solutions as 
simple modular networks, and that the modularity scores 
of alternative solutions can be even closer. 

Suppose that the optimal partition of a hierarchical 
network contains two modules i and each of which is 
composed of exactly two subgroups so that i = {a, 6} and 
j — {c, d}. Let us first split i and j into their constituent 
subgroups {a, b, c, d} and then merge the opposite pairs of 
subgroups to obtain the suboptimal partition i' ~ {a, c} 
and j' = {b,d}. From Eq. ([2|, the change in modularity 
Q for this operation is exactly 

AQ = {AQac + AQbd) - {AQab + AQ,<j) 

(eac + ebd) - (e ah 1 Ccd J 
m 

-^{:^){'^)' '«) 

Unlike Eq. ([2]), the size of the penalty now depends only 
on differences in connectivities, rather than on their ab- 
solute values, and will thus tend to be much smaller than 
the penalty discussed in the previous section. 

If the network's hierarchical structure is relatively bal- 
anced (i.e., submodules at the same level in the hierarchy 
have similar degree d) then Eq. ([8| will be dominated by 
its first term, whose size depends only on the differences 



in the pairwise connectivities of the submodules. This is 
very small both when the groups i and j are close to each 
other in the hierarchy, e.g., are siblings or cousins, and 
thus have similar inter- and intra-module connectivities, 
and when i and j are relatively low in the hierarchy, and 
thus have few connections to begin with. 

Furthermore, each level of a hierarchy presents its own 
set of modular structures that can be merged, either 
within the same level or between different levels of the 
hierarchy. The number of ways these structures can be 
combined depends on their particular hierarchical organi- 
zation and the number of connections between distantly 
related groups. Generally, however, it follows the same 
bounds we showed above for a non-hierarchical network, 
i.e., at least 2'^"^ and no more than the fcth Bell number, 
when there are k modular structures at the lowest level 
of the hierarchy. 

We also note that hierarchical problems are not, in 
fact, limited to hierarchical networks. The resolution 
limit phenomenon, which tends to produce agglomera- 
tions with modular substructure, creates effective hierar- 
chical structure even in a non-hierarchical network and 
thus can induce hierarchy-style degeneracies in the mod- 
ularity function. 

In both cases considered above, the existence of ex- 
treme degeneracies in the modularity function does not 
depend on the detailed structure of the particular net- 
work or on any external notion of a "true" module. In- 
stead, these solutions exist whenever a network is com- 
posed of many groups of nodes with relatively few inter- 
group connections. In a sense, these groups consti- 
tute the "building blocks" used to construct the high- 
modularity partitions. 

Fundamentally, these degeneracies arise because the 
modularity function does not strongly penalize partitions 
that combine such groups and the degeneracies are legion 
because there are at least an exponential number of such 
combinations. As a consequence, the modularity function 
is not strongly peaked around the optimal partition — in 
physics parlance, the modularity function is glassy — in 
precisely the case that we would like modularity maxi- 
mization to perform best: on modular networks. 

We note that similar degeneracies are likely to oc- 
cur in other kinds of module-identification quality func- 
tions, including some likelihood-based functions [15] and 
they persist under directed, weighted, bi-partite and 
multi-scale generalizations of modularity. For the 7- 
generalization of Q introduced by Reichardt and Born- 
holdt [5H^, choosing 7 < 1 increases the severity of the 
degeneracy problem, by reducing the penalty for merging 
modules, while choosing 7 > 1 reduces it somewhat, by 
increasing the penalty. For any fixed 7, however, there 
exist many networks that will exhibit severe degeneracies, 
and, moreover, it remains unclear how to identify the 
"correct" value of 7 without resorting again to an exter- 
nal definition of a "true" module. Similar issues apply to 
other parametric generalizations of modularity [391 140j . 
For most weighted networks, the degeneracy problem will 
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tend to be stronger because weights effectively reduce the 
penalty for merging some modules. Also, many weighted 
networks are dense and, as we showed above, these ex- 
hibit many more degenerate solutions than sparse net- 
works (even if they may not necessarily exhibit a resolu- 
tion limit; see Section [lT|) . 

Of course, an optimal partition always exists, even if 
it may be almost impossible to find in practice. But the 
scientific value of the optimum does seem somewhat di- 
minished by the enormous number of structurally diverse 
alternatives that are only slightly "worse" from the per- 
spective of their modularity scores. That is, without ex- 
ternal information, it becomes unclear which particular 
partition, within the enormous number of roughly equally 
good ones, is more scientifically meaningful [26'. And, 
requiring such external information defeats the purpose 
of identifying modules automatically from connectivity 
data alone. 



IV. THE LIMITING BEHAVIOR OF Q„ 
MODULAR NETWORKS 
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In addition to the location of the peak of the mod- 
ularity function and its surrounding structure, another 
important question is the expected height of the peak. 
In this section we show that, in the asymptotic limit of 
an increasingly modular network — i.e., as we add more 
modules to the network — the height of the modularity 
function converges to Qmax = 1- This analysis fills an im- 
portant gap in our understanding of the modularity func- 
tion's behavior in practical contexts, as previously only 
unrealistic cases such as lattices, trees and Erdos-Rcnyi 
random graphs [HI HH] have been considered. Notably, 
these results serve to explain why large values of Qmax 
are often found for very large real- world networks [25] . 

Consider a sparse network with n nodes, m = 0{n) 
edges and an optimal partition that contains k modules. 
Because modularity is a summation of contributions from 
individual terms, we may rewrite Eq. ([T]) for the optimal 
partition as 
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where (e) = - is the average number of edges 
within an optimal module, Var(.) is the variance func- 
tion and {d) = j: ^ ^ di is the average degree of an optimal 
module. 

Now, imagine a process by which we connect new mod- 
ular subgraphs of some characteristic size (e) — 0(1) to 
the network, i.e., we assume that the average size of a 
module does not increase as the network grows (but see 
below), and consider the asymptotic dependence of Qniax 
in the limit of this infinitely modular network. 



It will be convenient to rewrite the average degree of a 
module as 



(d) 
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where (e° 



i J2i ^' denotes the average number of 
outgoing edges in each module. Because modules do not 
grow with the size of the network, the number of modules 
k is 0{n), and hence the average out-degree (e°"') must 
be 0(1) to ensure that the network remains sparse. This 
implies that the average degree (d) is also 0(1). Finally, 
because Var(c?i/2m) — in the limit, we may ignore the 
last term in Eq. (jof. 



Combining the expression for (d) [Eq. ( 10 )] with the ex- 
pression for the maximum modularity [Eq. (19])], we have 
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Rewriting the number of edges in the network m as a 
function of the connectivity of the optimal modules 



m=h{d) = '^{2{e) 



and carrying out the summation in Eq. (11), we see that 
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Thus, in the limit of fc cxi, Qmax approaches some 
constant less than 1, which depends only on the relative 
proportion of internal to external edges in each module. 
However, this analysis is incomplete in a crucial way: it 
ignores the impact of the resolution limit described in 
Section |llj which can cause the average size of a mod- 
ule in the optimal partition to grow with the size of the 
network [31[31H3S]. 

When the resolution limit causes two groups of nodes 
to be merged, the links joining them become internal, 
which in the limit causes the average out-degree of a mod- 
ule in the optimal partition to be asymptotically domi- 
nated by its average internal density, i.e., (e°"*) = o((e)). 
This, in turn, implies that (e°"*)/(e) — > as fc — > oo. Ac- 
counting for this resolution-limit induced agglomeration, 
we now see that the first term in Eq. (12 1 approaches 



1 while the second term approaches 0, implying that 
Qmax — >■ 1 as fc — )■ oo. [Rccall, however, that this last 
step does not hold for all weighted networks: consider a 
limiting process in which each module connects to 0(fc) 
other modules with total weight o(fc), e.g., the collection 
of fc cliques connected to each other by edges of weight 
2/(fc — 1) from Section |llj The resolution is not present in 
such a network and hence Qmax merely approaches the 
value given in Eq. (12).] 



Thus, just as the severity of the degeneracy problem 
depends strongly on the number of modular structures 
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in the network, so too does the height of the modularity 
function. Further, the number of these structures k is 
limited mainly by the size of the network, since there 
cannot be more modular structures than nodes in the 
network. In practical contexts, variations in n are very 
likely to induce variations in Qmax and increasing n (or 
k) will generally tend to increase Qmax- If the intention 
is to compare modularity scores across networks, these 
effects must be accounted for in order to ensure a fair 
comparison. 

Of course, the precise dependence of Qmax on n and k 
depends on the particular network topology and how it 
changes as n or fc increases. For instance, in Appendix [A} 
we derive the exact dependence for the ring network and 
calculate precisely how many of its degenerate solutions 
lie within 10% of Qmax- Because of this dependence, 
an estimate of Qmax for any empirical network should 
not typically be interpreted without a null expectation 
based on networks with a similar number of modules. 
For instance, detailed values of Q should probably not 
be compared across different networks, as in a regression 
of modularity Q versus network size n |33j . 

Finally, we point out that this dependence of Qmax on 
n and k makes intuitive sense given that the null model 
against which the internal edge fractions of the modules 
are scored [the second term in Eq. ([T])] is a random graph 
with the same degree sequence (see Section |ll]) . That is, 
as the number of modules increases, it is increasingly 
unlikely under the null model that any edges fall within 
a particular module given the huge number of possible 
connections to other modules. In this sense, it is not at 
all surprising that extremely high modularity values have 
been found for extremely large real-world networks. For 
instance, Blondel et al. 25j estimated Qmax > 0.984 for 
one Web graph with 118 million nodes and Qmax > 0.979 
for a different Web graph with 39 million nodes. Such 
high values may not indicate that they are particularly 
modular, but instead that they are simply very different 
from a random graph with the same degree sequence. 



MAPPING THE MODULARITY 
LANDSCAPE 



To get a more intuitive handle on the precise struc- 
ture of the modularity function, we now describe a nu- 
merical technique for reconstructing a locally accurate, 
low-dimensional visualization of it. We then apply this 
technique to instances of synthetic modular or hierarchi- 
cal networks. In the next section, we apply it to three 
real-world networks and show that the high-modularity 
partitions of empirical networks can disagree strongly on 
many, but not all, partition properties. 





FIG. 2: (color online) The modularity function of a ring net- 
work (k = 24 and c = 5), reconstructed from 997 sampled 
partitions (circles), showing a prominent high- modularity 
plateau. The vertical axis gives the modularity Q; the x- 
and y-axes are the embedding dimensions. (These dimen- 
sions are complicated functions of the original partition space 
and thus their precise scale is not relevant; see Appendix [d|| . 
Note that the structure within the plateau (inset) is highly ir- 
regular, illustrating the severe degeneracies of the modularity 
function. 



A. The reconstruction technique 

Our focus here is on the modularity function's struc- 
ture in the vicinity of the degenerate high-modularity 
partitions identified in Section |lIT] To do this, we sample 
partitions using a simulated annealing (SA) algorithm. 
Each SA sample was started from a random initial parti- 
tion and stopped either at a randomly chosen step (75% 
of runs) or at a local optimum (25% of runs) . This mix- 
ture of stopping points ensures that our sample contains 
both a large number of local optima as well as a sampling 
of sub-optimal partitions in their vicinity. Complete de- 
tails of the sampling approach are given in Appendix [B} 

To reconstruct and visualize the structure implied by 
these sampled partitions, we embed them as points in a 
2-dimensional Euclidean space such that we largely pre- 
serve their pairwise distances. The distance between par- 
titions is measured by one popular distance metric for 
partitions, called the variation of information (VI) [50] 
and defined as follows. Given partitions C and C", the 
variation of information between them is defined as 



VI(C, C") = H{C, C") - I{C; C") 



(13) 



where H{.; .) is the joint entropy [s ee E q. ( |C3[ )] and /(.; .) 
is the mutual information [see Eq. (C4)] between the two 
partitions. Additional details are given in Appendix [C| 
We note that using other measures of partition distance. 



FIG. 3: (color online) The modularity function of a hierarchi- 
cal random graph model with n = 256 nodes arranged 
in a balanced hierarchy with assortative modules (see Ap- 
pendix [e]|, reconstructed from 1199 sampled partitions (cir- 
cles), and its rugged high-modularity region (inset). 



FIG. 4: (color online) The modularity function for the 
metabolic network of the spirochaete Treponema pallidum 
with n = 482 nodes (the largest component) and 1199 sam- 
pled partitions, showing qualitatively the same structure as 
we observed for hierarchical networks. The inset shows the 
rugged high-modularity region. 



such as one based on the Jaccard coefficient, yields simi- 
lar results (see Fig. [o] in Appendix [C|). 

The embedding portion of our reconstruction tech- 
nique seeks an assignment of partitions {Ci} to coordi- 
nates {{xi,yi)} such that the pairwise distances between 
partitions are largely the same as the pairwise distances 
between embedded points. Only the relative positions of 
points in the embedded space are significant; their pre- 
cise locations are meaningless. Then, by assigning each 
embedded point a value in a third dimension equal to 
the modularity score Qi of the corresponding partition, 
we can directly visualize the structure of the sampled 
modularity function. 

Because we are interested in the function's degenera- 
cies, we prefer an assignment that errs on the side of be- 
ing more smooth, i.e., less rugged, in the projected space 
than in the original partition space. Methods like princi- 
pal component analysis use a linear function to measure 
the quality of the embedding, which can cause some local 
structure to be lost or distorted when projecting from 
non-Euclidean spaces like the partition space. Instead, 
we use a technique called curvilinear component analysis 
(CCA) [51], which preserves local distances at the ex- 
pense of some distortion at larger distances. Thus, our 
reconstructed modularity landscapes are appropriately 
conservative, sometimes reducing the apparent rugged- 
ness of the reconstructed landscape, but never creating 
ruggedness where it does not exist in the first place. 

Additional details of the CCA technique are given 
in Appendix [Dj For completeness, we note that several 
other approaches to mapping specific features of the mod- 
ularity landscape are described in Refs. [511 [5^ [55] . 



B. Reconstructed modularity functions for 
modular and hierarchical networks 

Using a ring network with fc = 24 and c — 5 (Fig. [I]), 
Fig. [2] shows the modularity function reconstructed from 
nearly 1000 sampled partitions. Examining these in de- 
tail, we see that every low-modularity partition divides 
many cliques across different groups, which leads to low 
values of Q. In contrast, the high-modularity partitions 
are composed of various groupings of the cliques, as pre- 
dicted in Section [III[ In the embedded modularity func- 
tion, these high-modularity partitions tend to cluster to- 
gether, forming a distinct "plateau" region. Within this 
region, the function shows complicated degeneracies and 
no clear maximum (Fig. [2] inset). 

Now turning to the case of a hierarchically structured 
network, we use a simplified version of the recently in- 
troduced hierarchical random graph (HRG) model |47j . 
in which we organize n = 256 nodes into nested mod- 
ules using a balanced binary tree structure — so that sub- 
modules at the same level in the hierarchy have similar 
sizes — and an assortative connectivity function — so that 
submodules become more internally dense as we descend 
the hierarchy from large to small groups. Appendix |E] 
gives the precise details of the HRG model we use and 
analytically derives its optimal partition. 

From this model, we drew 100 network instances and 
combined sampling results from this ensemble to smooth 
out deviations caused by fluctuations in the random 
graph structure [SUl SI]- As a consequence, the recon- 
structed modularity function is smoother than it would 
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be for any particular instance and shows only the struc- 
ture induced by the hierarchical organization of the net- 
work. Figure [3] shows the reconstructed modularity func- 
tion for nearly 1200 sampled partitions. Again exam- 
ining these partitions in detail, we find that nearly all 
of the high-modularity partitions in the "plateau" re- 
gion mix submodules from different levels of the hier- 
archy and often fail to resolve distinct branches, as pre- 
dicted in Section III Like the ring network (Fig. [2]), the 
high-modularity region in this case is extremely rugged, 
with many peaks and valleys and no clear global opti- 
mum (Fig. [sj inset). 

As mentioned above, the CCA embedding technique 
only guarantees a lower bound on the ruggedness of the 
reconstructed modularity function. Thus, what appear 
to be local minima in the embedding are actually quite 
likely to be local maxima themselves and the true rugged- 
ness is almost surely more extreme than it appears in 
these visualizations. 



VI. STRUCTURAL DIVERSITY AMONG 
HIGH-MODULARITY PARTITIONS 



Although our analytic arguments are entirely general, 
our numerical results have focused on specific synthetic 
networks derived from models of modular and hierarchi- 
cal networks. These models may not be representative 
of the networks found in the real world, since they lack 
certain properties commonly observed in real-world net- 
works (to name a few simple properties: unequal module 
sizes and heavy-tailed degree distributions |54j ) . In this 
section, we apply our reconstruction technique to several 
real-world examples of complex metabolic networks and 
consider the degree to which different high-modularity 
partitions agree on the large-scale modular structure. 

Metabolic networks are an interesting test case for this 
analysis because the answers to many questions in sys- 
tems biology depend on our ability to accurately char- 
acterize their modular and hierarchical structure [5HTT] 
and modularity maximization has been used extensively 
in their analysis. We emphasize, however, that our re- 
sults likely also hold for other types of networks, such as 
social or technological networks, since the ruggedness of 
the modularity landscape depends only on the presence 
of modular or hierarchical structure. 

Figure [4] shows the reconstructed modularity function 
for the largest connected component in the metabolic net- 
work of the spirochaete Treponema pallidum [n — 482, 
m = 1223) and Fig. 13 in Appendix |f] shows the func- 
tions for the mycoplasmatales Mycoplasma pneumoniae 
and Ureaplasma parvum |12j . All three modularity func- 
tions are similar to those of the modular and hierarchi- 
cal model networks shown in Figs. [2] and [3] exhibiting 
a broad and rugged region of high-modularity partitions 
with no clear global maximum. 
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FIG. 5: (color online) The fraction remaining of the mean 
pairwise distance between sampled high-modularity partitions 
when all but the k' largest groups in each partition are merged 
into a single group, for the T. pallidum network and for a 
random graph with the same degree sequence. (Error bars 
indicate one standard deviation; inset shows the distribution 
of the number of groups in a partition.) In both cases, the 
fraction converges slowly on as more modules are merged, 
indicating that the majority of the structural diversity cap- 
tured by these partitions is driven by significant differences 
in the composition of the largest few identified groups. This 
behavior is especially true for the empirical network. 



A. Large-scale similarity 

From a pragmatic perspective, the multiplicity of high- 
modularity partitions is more troublesome if they dis- 
agree on the large-scale modular structure of the net- 
work. In contrast, if high- modularity partitions disagree 
mainly on the composition of the smallest few modules, 
but agree on the composition of the larger ones, modu- 
larity maximization can provide useful information about 
a network's large-scale modular structure in spite of the 
degeneracies. 

Using our sampled partitions, a direct and straightfor- 
ward test of this possibility is the following. For each lo- 
cally optimal partition, we set aside the k' largest identi- 
fied modules and then merge the remaining smaller mod- 
ules into a single group. If most of the differences between 
local optima are in the composition of the smaller mod- 
ules, the mean pairwise distance between the reduced 
partitions will vanish as we merge more of these small 
modules into a single group. However, if a significant 
fraction of the original mean pairwise distance remains 
even when almost all of the smaller modules have been 
merged, i.e., when k' is small, then we have significant ev- 
idence that the high-modularity partitions fundamentally 
disagree on the networks' large-scale modular structure. 

Figure [5] shows the results of this test using the sam- 
pled partitions of the T. pallidum metabolic network, for 



10 



9 > fc' > 1 (with similar results for the other metabolic 
networks; see Fig. 14 in Appendix |f| . For comparison, 



we also show results for a random graph with the same 
degree sequence, which has no real modular structure. 
Notably, the mean pairwise distance among the original 
empirical partitions decreases very little (0.05%) when 
we retain only the fc' = 9 largest groups; in contrast, the 
random graph exhibits a much larger change (13%). 

Counter-intuitively, this implies that the high- 
modularity partitions of the random graph exhibit 
greater agreement on the composition of the largest few 
modules, i.e., less structural diversity, than do the high- 
modularity partitions of the empirical network. Addi- 
tionally, the mean pairwise distance for the T. pallidum 
partitions only falls below 50% of its original value when 
we merge all but the k' = 2 largest groups. That is, 
almost half the variation of information between high- 
modularity partitions is explained by differences in the 
composition of their two largest modules, with the re- 
mainder being caused by disagreements on the composi- 
tion of all other modules. 

Thus, partitions that are "close" in terms of their mod- 
ularity scores can be very far apart in terms of their par- 
tition structures and most of the differences come from 
disagreements on the composition of the largest identi- 
fied modules. This suggests that the degeneracies in the 
modularity function really do pose a problem for inter- 
preting the structure of any particular partition and that 
a high modularity score provides very little information 
about the underlying modular structure. 



B. Structural summary statistics 

For some research questions, however, the precise com- 
position of the modules is not as important as the value 
of some statistical summary of the partition's structure. 
Thus, an important question is whether high-modularity 
partitions tend to agree on the values of simple summary 
statistics, even if they disagree on the precise partition 
structure. Naturally, the particular statistical quantity 
will depend on the research question being asked and the 
safest approach is to directly test whether the quantity 
measured on one high- modularity partition is represen- 
tative of its distribution over many high-modularity par- 
titions. Here, we briefly study two such summary statis- 
tics: the mean module density and the distribution of 
module sizes. We note, however, that tests of reliability 
like these may not generalize to larger networks, as the 
number of degenerate solutions, and thus their potential 
structural diversity, grows rapidly with the size of the 
network tested (see Section III I . 

Using the same high-modularity partitions of the T. 
pallidum metabolic network, along with a second set 
of high-modularity partitions derived using the Louvain 
method [25], we compute the average module density 
(ei/rii) and the distribution of module sizes p{nj) for each 
partition. The former statistic can be immediately com- 
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FIG. 6: (color online) The cumulative distribution functions 
for the (a) mean module density (ei/ni) and (b) Kolmogorov- 
Smirnov distance d between module size distributions p(ni), 
among sampled high-modularity partitions of the T. pallidum 
network. (A dot indicates the distributional mean.) In both 
cases, we show the distributions for partitions derived using 
simulated annealing and using the Louvain method. In both 
cases, the SA partitions exhibit a much less tightly peaked 
distribution than those derived using the Louvain method, 
indicating that high-modularity partitions exhibit non-trivial 
structural diversity, even under these summary statistics. 



pared between partitions; to compare the latter, we com- 
pute pairwise Kolmogorov-Smirnov (KS) [55 distances 
between the distributions. If the statistic's distribu- 
tion is tightly concentrated, then any particular high- 
modularity partition can be assumed structurally rep- 
resentative, under that summary statistic, of the other 
high-modularity partitions. 

Figure [6] shows the resulting distributions for our two 
simple measures. In both cases, the distributions for the 
Louvain partitions are indeed relatively tightly concen- 
trated, illustrated by a large increase of the CDF over 
a small range in the x variable. This suggests that 
the Louvain method tends to find partitions with rel- 
atively similar module densities and module sizes. In 
contrast, however, the SA partitions exhibit much more 
variance under both measures. This indicates that the 
SA method more accurately samples the full structural 
diversity of the high-modularity partitions than docs the 
Louvain method. (To be fair, the Louvain method was 
not designed to find representative high-modularity par- 
titions, but rather to be very fast at finding some high- 
modularity partition.) 

On this network, both methods tend to produce parti- 
tions with similar mean module densities: the estimated 
means are {ei/rii) = 1.421±0.013 [meanistd. err.] for SA 
versus 1.520 ± 0.005 for Louvain. The Louvain method, 
however, underestimates this statistic's standard devia- 
tion by about a factor of 2 relative to the SA method 
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(cr = 0.223 for SA versus a = 0.095 for Louvain). 

Thus, these resuhs support our conclusion above: 
high-modularity partitions can exhibit non-trivial struc- 
tural diversity, even under simple structural measures like 
the mean module density and the distribution of module 
sizes. Of these two, the mean module density is more reli- 
ably representative, although even it exhibits non-trivial 
variance. In contrast, the distribution of module sizes 
exhibits a great deal of variation. Thus, we generally 
recommend a cautious approach when interpreting the 
structure of one or a few high-modularity partitions, as 
their structural characteristics may not be representative 
of alternative high-modularity solutions. 



VII. DISCUSSION 

To summarize, the modularity function Q poses three 
distinct problems in scientific applications: 

1. The optimal partition may not coincide with the 
most intuitive partition (the resolution limit prob- 
lem [32l I34H36] ). an effect driven primarily by the 
consequences of assuming that inter-module con- 
nectivity follows a random graph model (see Sec- 
tion |ll|. 

2. There are typically an exponential number of struc- 
turally diverse alternative partitions with modular- 
ities very close to the optimum (the degeneracy 
problem). This problem is most severe when ap- 
plied to networks with modular structure; it occurs 
for weighted, directed, bi-partite and multi-scale 
generalizations of modularity; and it likely exists 
in many of the less popular partition score func- 
tions for module identification (see Sections III V] 
andlVl|. 



3. 



The maximum modularity score Qmax depends on 
the size of the network n and on number of modules 



k it contains (see Section IV I 



To be practically useful, we believe that future method- 
ological work on module identification in complex net- 
works must, in particular, include some effort to address 
the existence of degenerate solutions and the problems 
they pose for interpreting the results of the procedure. 

The discovery of extreme degeneracies in the modular- 
ity function also provides an answer to a nagging question 
in the literature: given that maximizing modularity is 
NP-hard in general 122! , why do so many different heuris- 
tics perform so well at maximizing it in practice? And 
further, why do different methods often return different 
partitions for the same network? The answer is that the 
exponential number of high-modularity solutions makes 
it easy to find some kind of high-scoring partition, but, 
simultaneously, their enormous number obscures the true 
location of the optimal partition. 

In this light, it is unsurprising that different heuristics 
often yield different solutions for the same input network. 



particularly for very large networks. Different heuris- 
tics will naturally sample or target distinct subsets of 
the high-modularity partitions due to their different ap- 
proaches to searching the partition space (for instance, 
see Fig. [7j in Appendix [b]). This implies that the re- 
sults of deterministic algorithms, such as greedy opti- 
mization p5^E5] or spectral partitioning [371 IMj i which 
return a unique partition for a given network, should be 
treated with particular caution, since this behavior tends 
to obscure the magnitude of the degeneracy problem and 
the wide range alternative solutions. 

The structural diversity of high-modularity partitions 
(Figs.[5]and[6| suggests that a cautious stance is typically 
appropriate when applying modularity maximization to 
empirical data. Unless a particular optimization or sam- 
pling approach can be shown to reliably find representa- 
tive high-modularity partitions, the precise structure of 
any high-modularity partition or statistical measures of 
its structure should not be completely trusted. 

Finally, even the estimated modularity score Qmax, 
which may be "significant" relative to a simple random 
graph [41 , should be treated with an ounce of caution as 
it is almost always a lower bound on the maximum mod- 
ularity (but see Ref. [26]) and its accuracy necessarily 
depends on the particular algorithm and network under 
consideration. As a result, an estimate of Qmax should 
not be mistaken for a network property that can be fairly 
compared across networks: as we showed in Section |IV[ 
Qmax depends on the number of module-like structures in 
the network and on their interconnectivity, both of which 
are limited by the network's size. Thus, variation in size 
can induce variation in the maximum modularity value 
and a fair comparison between different networks must 
control for this correlated behavior. 

Although the degeneracy problem presents serious is- 
sues for the use of modularity maximization in scien- 
tific contexts, certain kinds of sophisticated approaches 
may be able to circumvent or mitigate some of its con- 
sequences. For example, Sales-Pardo et al. (31] re- 
cently proposed combining information from many dis- 
tinct high-modularity partitions to identify the basic 
modular structures that give rise to the degenerate so- 
lutions. To be useful, however, the high-modularity par- 
titions should be sampled in an unbiased and relatively 
complete way, e.g., by using a Markov chain Monte Carlo 
algorithm [3D]. This type of approach may also provide 
a way to identify overlapping |52j or hierarchical mod- 
ules [3T]. (That being said, hierarchical structure poses 
a special problem for modularity, because, strictly speak- 
ing, there is no "correct" partition of a hierarchy; at best, 
a good partition will identify the modules at a particu- 
lar hierarchical level. Separate tools are needed to in- 
fer distinct levels.) On the other hand, the difficulty of 
constructing an unbiased sample of an exponential num- 
ber of degenerate solutions may prevent these methods 
from uncovering subtle or large-scale relationships, par- 
ticularly in larger networks. 
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Another set of promising techniques try to estimate the 
statistical significance of a high-modularity partition |181 
132]) i-e-, to answer the question of how much true struc- 
ture is captured by a particular high-modularity parti- 
tion. And, techniques based on local methods [5^ I57j. 
which do not attempt to partition the entire network, or 
on random walks over the network j58H61| , may provide 
useful alternatives to modularity maximization, although 
these may still exhibit degenerate behavior. 

A particularly promising class of techniques for identi- 
fying modular and hierarchical structures relies on gen- 
erative models and likelihood functions. Stochastic block 
models [JHl IS^tiBB] and hierarchical block models [37] are 
attractive because they can allow module densities to 
vary independently, although their flexibility can come 
with computational costs and their results can be more 
difficult to interpret. In some cases, these models can 
capture overlapping modules [65j . In general, the likeli- 
hood framework presents several opportunities not cur- 
rently available for modularity-based methods. For in- 
stance, by comparing the likelihoods of empirical net- 
work data under different structural models, researchers 
can give statistically principled answers to model selec- 
tion questions, such as, is this network more hierarchical, 
more modular, or neither? But, likelihood functions can 
also exhibit extreme degeneracies and optimization tech- 
niques for module identification should likely be treated 
with caution. To ensure good results, it may be necessary 
to use a sampling approach [¥7] . 

In closing, we note that the development of objective 
and accurate methods for identifying modular and hier- 
archical structures in empirical network data is crucial for 
many systems-level questions about the structure, func- 
tion, dynamics, robustness and evolution of many com- 
plex systems. The magnitude of the degeneracy problem, 
and the dependence of Qmax on the size and number of 
modules in the network, suggests that modules identified 
through modularity maximization should be treated with 
caution in all but the most straightforward cases. That 
is, if the network is relatively small and contains only a 
few non-hierarchical and non-overlapping modular struc- 
tures, the degeneracy problem is less severe and modu- 
larity maximization methods are likely to perform well. 
In other cases, modularity maximization can only pro- 
vide a rough sketch of some parts of a network's modular 
organization. We look forward to the innovations that 
will allow it to reliably yield more precise insights. 
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Appendix A: The dependence of i 
ring network 



on n for the 



A simple case where it is straightforward to work out 
the precise dependence of Qmax on network size is the 
ring network from Section |llj 

Consider such a network with k cliques, each composed 
of exactly c nodes, and where we hold c constant while 
increasing n, i.e., we add more modules to the ring such 
that k — n/c. If the optimal partition merges H. adjacent 
cliques (due to the resolution limit), then it can be shown 
that the modularity is exactly 



(Al) 




where 




Thus, as n — > oo the second and third terms in Eq. ( Al ) 
vanish like 0{l/^/n) and Qmax 1- 

As a brief aside, we now connect this result to the large- 
scale behavior of the "plateau" region of the modularity 
function mentioned in the main text. For concreteness, 
we define the plateau as the set of partitions with mod- 
ularity scores within 10% of Qmax- 

To begin, we note that the asymptotic result given 
above implies that the height of the plateau, which is 
simply the maximum modularity value, increases with k. 
We now characterize the size of the plateau by consid- 
ering the number of partitions for med by merging con- 
there are 2^ such 
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nected cliques. As shown in Section . 
partitions for the ring network because there are k edges 
connecting cliques, each of which can be cut or not cut 
to create a different group. 

Let Qi be the modularity score of the intuitive parti- 
tion, i.e., the one that places each clique in its own group. 
It can be shown that the ratio Qi/Qmax is a monotoni- 
cally decreasing function of k whose limit is 
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If the cliques are composed of at least c — b nodes, this 
ratio is 10/11 and the intuitive partition Qi is always 
somewhere within the plateau region. 

If the optimal partition merges £ adjacent cliques, then 
it can be shown that there are at least 2'^'-^"^/^-' partitions 
with no more than i cliques in a single module and each 
of these partitions will be within 10% of the maximum 
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modularity because Qi bounds their modularity from be- 
low. Since i = 0{^/n) — 0{Vk), in the limit of large 
k, the number of partitions depends only on the num- 
ber of cliques and we have an exponential expansion in 
the number of partitions in the high-modularity plateau. 
Thus, as k grows large, both the height and the size of 
the plateau increase as well, with the latter increasing 
exponentially. 

Although the details would change for a different net- 
work structure, in principle, such an exponential expan- 
sion in the size of the plateau should be universal. This 
provides a very broad target for optimization algorithms. 



Appendix B: Simulated Annealing 

To initialize each simulating annealing (SA) sample 
run, we start the procedure at a "random" partition, in 
which we first choose a number of communities k and 
then assign each node to one of these communities with 
equal probability. 

At each step of the algorithm, a modification of the cur- 
rent partition is proposed, e.g., by moving a node from 
one group to another, by merging two groups or by split- 
ting one group into two. If this modification results in a 
partition with higher modularity, the current partition is 
replaced with the proposed one. Otherwise it is replaced 
with probability e^l'^'^'/"^, where AQ is the difference in 
modularity between the current and proposed partition 
and T is the temperature parameter, which we decrease 
according to the annealing schedule (see below). If the 
proposed modification is rejected, we retain the current 
partition and propose a new modification at the next 
step. As T 0, the algorithm is guaranteed to converge 
to a local optimum in the modularity function. 

To implement the algorithm, we must define the set of 
possible modifications (the move set), which determines 
the local neighborhood of any given partition. Different 
choices of move set can drastically alter both the conver- 
gence time of the algorithm and its ability to escape local 
optima. The choice of move set can even affect the kinds 
of local optima we sample (see below). For our purposes, 
it is less important that the algorithm converge on the 
global optimum than it is to sample a broad section of the 
modularity function in a relatively unbiased way. Some 
alternative heuristics for maximizing modularity can also 
be used to sample the modularity landscape, e.g., the 
Louvain method [25], but these often do so with par- 
ticular biases and thus are not as flexible as simulated 
annealing for obtaining a clear view of the modularity 
function's degeneracies (e.g., see Fig. [6]). 

We employed two simple move sets: (i) single node 
moves and (ii) a combination of single moves, merges 
and splits. A single node move takes a node chosen uni- 
formly at random from the n nodes in the network and 
either moves it to another group, chosen uniformly at 
random from the remaining groups, or places it in a new 
group by itself. (If the chosen node is the only member 



of its group and it is successfully moved to another ex- 
isting group, the number of existing groups decreases.) 
If the current partition has k communities, this move set 
defines a local neighborhood for any particular partition 
that is composed of wi — n{k — 1) -\- n = nk neighbor- 
ing partitions. (We note that this move set is similar 
to the partition modifications used in the Kernighan-Lin 
heuristic [S7].) 

In the second move set, we also allow merges and splits. 
With probability pm we choose two groups uniformly at 
random and merge them into a single group. Alterna- 
tively, with probability Ps we choose a group uniformly 
at random and split it into two subgroups such that the 
number of edges between them is minimized. (This dif- 
fers from the "heat bath" approach used in Ref. 15 .) 
This optimization problem is conventionally called MiN- 
CUT and we use a standard algorithmic solution for find- 
ing the minimum cut weight [68' . This way of choosing a 
split for a group typically results in a relatively good par- 
tition; in contrast, a randomly chosen bipartition would 
almost surely result in a lower modularity score and thus 
would almost always be rejected. Finally, with proba- 
bility 1 — {pm +Ps), we perform a single node move as 
described above. This move set defines a local neighbor- 
hood of size W2 = nk + (2) + k, where the first term 
comes from the single node moves (as above) and the 
other terms denote the number of merges and splits, re- 
spectively. For a particular network, we choose Pm and Ps 
so that each individual neighboring partition is proposed 
with roughly equal probability. 

Once the move set is chosen, the convergence of the SA 
algorithm is determined by the annealing schedule, which 
controls the rate at which the temperature parameter 
decreases. For simplicity, we use a geometric schedule 
T{t) = Tqt^, where Tq > is some initial temperature 
and < r < 1 is the common ratio between successive 
temperatures. For best results, Tq and r must typically 
be tuned to a particular network topology, but so long 
as they are chosen to allow the SA algorithm sufficient 
time to explore the partition space, their values do not 
significantly impact our results. 

Each sample run obeys a termination criterion that is 
derived by bounding the number of failed modifications 
needed to decide whether the current partition is a local 
optimum with high probability. Let w* be the number of 
moves required to try each of the w possible modifications 
of the current partition. It can be shown that 

Pr[w* > j3w\ogw] < w^^^'^ . 

We choose (3 such that after f3nk log(nfc) rejected modi- 
fications, there is a 95% chance that there are no modifi- 
cations that would increase the modularity of the current 
partition. When this criterion is met, the SA algorithm 
terminates. The termination criterion is only necessary 
to improve the running time of the algorithm, particu- 
larly toward the end of the annealing schedule when most 
proposed modifications result in lower modularity scores. 



14 



3 0.3 

° 0.0 












1092 



5.2 



3.5 ~ 
> 



i 



1092 



1.7 



0.0 



Partitions {N = 1093) 



FIG. 7: (color online) For the metabolic network of Tre- 
ponema pallidum, the matrix of pairwise distances calculated 
from a sample of (i) 301 unique partitions found by the 
Louvain method, (ii) 292 unique local optima sampled us- 
ing the single node move set, (iii) 200 unique local optima 
sampled using the merge-split move set and (iv) unique 300 
low-modularity partitions sampled using the single node move 
set. The inset shows the corresponding modularity score as 
a function of left-to-right ordering in the matrix. The differ- 
ent sampling methods cause the coarse block structure in the 
distance matrix. 



For practical purposes, we made two slight modifica- 
tions to the SA algorithm described above. To prevent 
the algorithm from wasting significant time oscillating 
between two partitions whose modularity scores are iden- 
tical, we implement a self-avoiding behavior: in addition 
to the ordinary acceptance conditions, a proposal is ac- 
cepted only if it represents a partition that has not pre- 
viously been visited. This requirement is very unlikely 
to deny the SA algorithm access to the entire partition 
space for any but the smallest networks while consider- 
ably improving the performance on larger networks. 

The second modification concerns the initial partition 
assignment. Instead of choosing an initial value for k 
uniformly from the set {1, . . . , n}, we first select a value 
fcmax < n and choose k uniformly from {1, . . . A:,„ax}- For 
large networks, this prevents the algorithm from spend- 
ing considerable amounts of time reducing the number 
of groups from 0{n) down to a more appropriate value, 
which mainly impacts the running time of the algorithm. 



partly on theoretical grounds, as the single node move set 
constitutes the most natural minimal changes to a parti- 
tion while merge-split moves constitute the most natural 
higher-order or large-scale change to a partition in a mod- 
ular network. Thus, our choices are principled, but they 
are not guaranteed to be optimal. It is theoretically pos- 
sible that there exists a move set, i.e., a way of defining 
which partitions are "local" to each other, such that the 
degeneracy problem we describe largely disappears and 
the modularity function seen by this algorithm exhibits 
a clear and easy-to-find global optimum. However, the 
NP-hardness result of Brandes et al. (22] implies that, in 
general, there can be no such ideal move set for modu- 
larity maximization, i.e., one that allows us to efficiently 
find the global optimum, unless P=NP [55] , 

Alternative heuristics for optimizing the modularity 
function implicitly choose different move sets than the 
ones described above. Thus, different algorithms will 
"see" different versions of the modularity function and 
they may sample or target distinct high-modularity re- 
gions of the function. To test whether our results from SA 
are specific to the SA framework and our selected move 
sets, we briefiy consider whether the partitions sampled 
by a very different heuristic — Blondel et al.'s Louvain 
method |25| . which builds a high-modularity partition 
by recursively agglomerating groups of connected nodes 
or modules until a high modularity is achieved — exhibit 
similar behavior or overlap with those sampled by the SA 
approaches. 

Using the Treponema pallidum metabolic network as 
a realistic test case, we sample several hundred high- 
modularity partitions using the Louvain method, several 
hundred using the single node move set, and several hun- 
dred using the move-split move set. For comparison, we 
include several hundred low-modularity partitions from 
the single node move set (sampled early in the SA). Fig- 
ure [7] shows the resulting matrix of pairwise distances for 
these partitions (measured by their variation of informa- 
tion; see Section [C| below) . 

Most notably, we see that there is very little overlap 
between the high-modularity partitions sampled by the 
three heuristics, suggesting that different move sets (and 
thus different algorithms) do indeed sample distinct parts 
of the modularity function. In fact, the partitions sam- 
pled by the two SA move sets overlap very little. The fact 
that these sampled regions are distinct but still exhibit 
very high modularities (inset) reinforces the fact that the 
degeneracy phenomenon is ubiquitous and suggests that 
other approaches are likely to face similar issues. 



On the choice of move set and alternative algorithms 

There are any number of alternative move sets we could 
have employed and we intentionally considered only the 
two described above. This choice is motivated partly 
by convention, as previous SA algorithms for modularity 
maximization [14 have employed similar move sets, and 



Appendix C: The Distance Between Partitions 

We quantify the differences between partitions using 
one popular notion of partition "distance" called the 
variation of information (VI), which was introduced by 
Meila [SO]. This measure satisfies the standard axioms 
for a distance metric and thus preserves many of the in- 
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FIG. 8: The variation of information (VI), as a function of the 
size of the original module, when we move a single node into a 
new group, move 1/4 of the original nodes into a new group, 
or move 1/2 of the original nodes into a new group. In all 
cases, the VI increases monotonically, but with a slope that 
depends on the fraction of the original module being moved. 



tuitive properties we expect from a distance measure. 
Further, it does not rely on finding a maximally overlap- 
ping alignment of the partitions, which makes it fast to 
calculate. For a thorough discussion of other notions of 
distance between partitions, and of the advantages of the 
VI measure, see Ref. [T5] . 

The VI allows us to quantitatively test the hypothe- 
sis that suboptimal high-modularity partitions disagree 
with the optimal partition mainly in small or trivial 
ways, which would correspond to very small VI values 
(close to 0), e.g.. Fig. [5j It also allows us to construct 
low-dimensional visualizations of the sampled modularity 
function. 

Given partitions C and C", their VI is defined as 

VI(C, C") = H{C) + H{C') - 2/(C; C") (CI) 
^H{C,C')-I{C\C') , (C2) 

where H(.) is the entropy function and /(.;.) is the mu- 
tual information function. Using the definitions 

H{C, C')^-J2p(hj)iogp{i,j) 
^-^ n \ n J 



(C3) 



I{C;C') = Y,Pihj)log 



P{i)p{j) 



run-j 



(C4) 



we can further simplify Eq. ( C2 ) to an expression that 
depends only on counts: 



VI(C, C") 



n ^ — ' 



log 



(C5) 



where ni is the number of nodes in group i in C, nj is 
the number of nodes in group j in C" and n^.j is the total 
number of nodes in group i in C and in group j in C. 
Two partitions of the network are the same if and only if 
VI(C, C") = and the maximum possible VI is given by 
log n where n is the number of nodes in the network. 



Two example calculations using VI 

To give the reader a more intuitive feeling for how VI 
behaves, we briefly calculate a few distances using the 
mis-merged partitions we encountered in the main text. 

First, consider a partition, with k modules, in which 
one module has g nodes. If we move h nodes from this 
module into a new group, the distance between the orig- 
inal and the new partition is 



Yl{C, C') = -[g\ogg-ig- h) \og{g ~ h) 
n 



hlogh] , 
(C6) 



which obtains its maximum of {g/n) log 2 when h — g/2. 
Fig. |8] shows the functional dependence of the VI for sev- 
eral choices of g and h. Most notably, under VI, parti- 
tions that differ by a merge of two groups or a split of 
one group are more distant than those that differ only 
by a few displaced nodes. From the discussion in the 
main text, partitions that differ by merges and splits 
are precisely the kind we expect to find among the high- 
modularity but suboptimal partitions. 

For a second example, consider the split and merge 
operation discussed for a hierarchical network in the main 
text, where the modules i = {a, 6} and j — {c, d\ are 
both of size g and their submodules contain g/2 nodes 
each. The alternate partition i' = {a, c} and j' — {b, d\ 
has a distance 



VI(C,C')=4(5/n) log 2 



(C7) 



from the original one. Thus, this split and merge opera- 
tion produces a partition that is four times the distance 
from the original partition as one obtained by a single 
bisection of one group (the previous example). 

Finally, we note that the VI notion of distance is not 
without its weaknesses. The most significant of these is 
its unintuitive scale. Further, the maximum VI scales 
with the number of nodes or the number of modules in 
the partition and thus we cannot reliably compare VI dis- 
tances between networks with different sizes or number 
of modules. Thus, our results here and in the main text 
rely only on relative distances for partitions of the same 
network and not on any particular numerical value. 
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FIG. 9: (color online) The modularity function for the 
metabolic network of the spirochaete Treponema pallidum re- 
constructed using the Jaccard distance, which shows the same 
qualitative structure that we observed using the variation of 
information. The inset shows the rugged high-modularity re- 
gion. This suggests that our results do not depend sensitively 
on the choice of distance metric. 



Alternative distance measures 

Although the variation of information is a satisfactory 
partition distance measure for our needs, we would like to 
ensure that our main results (e.g., the ruggedness of the 
high modularity plateau and the large-scale structural 
disagreements between the high-modularity partitions) 
do not depend sensitively on the distance measure used. 
All that we technically require is a distance measure that 
satisfies the standard metric axioms. Another possible 
choice is provided by the Jaccard distance J, which is 
defined as 



J(C, C) = 



floi + 0,10 



(C8) 



where aoi is the number of pairs of nodes that are in the 
same module in C but different modules in C and vice 
versa for oiq. 

Figure |9] shows the reconstructed modularity function 
for the T. pallidum metabolic network using the Jac- 
card distance in place of the variation of information. 
Although the precise shape of the modularity function 
is different, we observe the same qualitative behavior 
present in the VI landscape: a rugged high-modularity 
plateau surrounded by a sea of lower modularity parti- 
tions. Furthermore, the Jaccard distance yields similar 
results to the variation of information when conducting 



the coarse-graining analysis outlined in Section VI This 
suggests that our results do indeed capture real proper- 
ties of the modularity landscapes of these networks and 
do not depend on our choice of distance measure. 



Appendix D: Curvilinear Component Analysis 

In principle, a matrix of pairwise VI distances for parti- 
tions of a network (like the one shown in Fig. [?]) contains 
all the information necessary to understand the structure 
of the modularity function. However, the non-Euclidean 
nature of the partition space makes this information dif- 
ficult to interpret. Thus, we use an embedding algorithm 
to project the distance matrix onto a two-dimensional 
Euclidean landscape. The modularity scores of each par- 
tition provide a third dimension. 

The projection from the original space (hereby referred 
to as the data space) to the 2D landscape (known as the 
latent space) can be phrased as an optimization problem: 
we seek an assignment of partitions to positions in the la- 
tent space that preserves the original pairwise distances 
as much as possible. The quality of any particular assign- 
ment is conventionally characterized by a stress function, 
which measures the errors in the projected distances. 

We use the curvilinear component analysis (CCA) al- 
gorithm [51], which preserves local distances at the ex- 
pense of some amount of distortion at larger distances. 
(Other suitable embedding algorithms exist, e.g., Sam- 
mon's non-linear mapping '70], but these often have con- 
cave error functions and are thus not guaranteed to con- 
verge.) Given a set of distances doix^y) in the data 
space, we wish to assign distances dL{x,y) in the latent 
space so as to minimize the stress function: 



2^\-dDix,y) 



dL{x,y)f F^{dL{x,y)) , (Dl) 



where Fx is a weight function. Here, we take -Fa to be 
a linear combination of Heaviside step functions chosen 
to produce a decreasing function with a null first deriva- 
tive nearly everywhere (for details, see Ref. [71]). This 
choice tends to conserve shorter distances while occasion- 
ally producing "tears" for large distances. The stress 
function is then minimized using the optimization proce- 
dure designed by Demartines and Herault [71] . 

In order to generate a relatively unbiased sampling of 
the modularity function from a large set of independent 
SA runs, i.e., to ensure that we sample both high and 
low modularity partitions and that our samples are rela- 
tively independent of each other, we do the following. A 
quarter of our sampled partitions are obtained by choos- 
ing the local optimum found when the run terminates. 
Each remaining partition is chosen by running the SA 
algorithm to its <th step, where t is drawn iid from a geo- 
metric distribution. By drawing only one partition from 
each run, and combining results from a large number of 
independent runs, we obtain a relatively even sampling 
of the high-modularity region of the modularity function. 

Notably, this procedure does not produce an unbiased 
sample, which could be obtained using a Markov chain 
Monte Carlo technique [30]. However, our goal is not a 
fully unbiased sample of partitions; rather, we seek a suf- 
ficiently even and unbiased sample of the high-modularity 



17 




6 



FIG. 10: (color online) The error rate of the embedding Ecca 
as a function of the number of steps t in the optimization algo- 
rithm for 25%, 50%, 75% and 100% of the 997 samples for the 
ring network (Fig. 2), normalized by sample size. The 0(t^^) 
decay in the error rate shows that the CCA algorithm is ro- 
bust to the number of samples and provides highly accurate 
a embedding of the original distances. 



FIG. 11: (color online) Latent space distances as a function 
of data space distances for a sample of 997 partitions of the 
ring network. Point sizes are weighted by the average modu- 
larity of the two partitions: larger circles represent distances 
between two high-modularity partitions whereas small circles 
correspond to distances between low modularity partitions. 



partitions that we can study the question of the modu- 
larity function's degeneracies and get a realistic recon- 
struction of this region of the modularity function. By 
biasing our sample in favor of high-modularity partitions, 
but sampling them independently, we can achieve that 
goal. The partitions with intermediate modularity val- 
ues are included to ensure some coverage of mid- and 
low-modularity regions. 

We validate the results of our embeddings in three 
ways. First, we test whether the qualitative structure 
of the embedded functions depends on the number of 
samples used. Using the ring network, we subsampled 
the 997 partitions used to construct Fig. [5] at the 25%, 
50% and 75% levels. Adding more samples should never 
decrease the ruggedness in the high-modularity region, 
but if the landscape changes significantly, it could in- 
dicate a problem with the embedding. Comparing the 
results, we find that the qualitative structure of the 
four landscapes — including the rugged structure of the 
plateau region (Fig. [2] inset) — is independent of the sub- 
sampling rate, suggesting that our full sample is more 
than adequate to give an accurate representation of the 
modularity function's structure. 

Second, we verify that the decrease in the stress func- 
tion Ecca is well behaved as the number of optimization 
steps increases, i.e., we see no evidence for pathological 
behavior in the embedding procedure. For all four of the 
subsampling levels described above, we find that the error 
decays roug hly as 0{t~^) in the number of optimization 
steps t (Fig. 10 ) and the mean final error is roughly 10""*. 
Since the mean distance between points is of order 1, this 
error rate implies that the embedding is quite accurate. 



Finally, we test whether our choice of F\ conserves the 
local structure of the modularity function. We test this 
by means of a Shepard diagram [72^ . which plots a ran- 
dom sample of the distances in the data space against 
the corresponding distances in the latent space. A Shep- 
ard diagram for the embedded ring network is shown in 
Fig. [TT] We note that deviations from the diagonal occur 
primarily at larger distances and that the local structure 
(bottom left of the figure) is generally well preserved. 
Even for those points where distance is not preserved, 
the algorithm errs on the side of assigning smaller dis- 
tances, which would only tend to make the landscape 
appear less rugged, i.e., more smooth, than it truly is. 



Appendix E: Hierarchical Random Graphs 

The hierarchical random graph (HRG) model, recently 
introduced by Clauset, Moore and Newman [47], provides 
a simple but realistic way to generate networks with hi- 
erarchical structure. However, the full HRG model is too 
flexible for our purposes. Instead, we employ a simplified 
version that fixes the hierarchical structure and the way 
the internal probability values vary. 

Under our simplified model, we arrange n — 2'^'""'^ 
nodes into groups according to a balanced binary tree 
structure with d^ax + 1 levels (Fig. 12). We assign edges 



between nodes by letting the internal probability values 
Pr increase with their distance from the root of the tree. 
This regularity gives the network an assortative struc- 
ture, in which modules become more dense as we move 
down in the dendrogram. Mathematically, we say that if 
the lowest common ancestor of two nodes is at level d in 
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level to move up in the hierarchy as the network grows. 
The resolution limit always implies that the optimal par- 
tition is composed of agglomerations of smaller modules, 
but in this hierarchical network, these agglomerations are 
simply composed of modules from lower down in the hi- 
erarchy. This analysis, however, says nothing about the 
degeneracies that characterize the modularity function 
in the local neighborhood of the optimal partition, which 
we discussed in Section ITlTl 



FIG. 12: (color online) An example of our simplified hierar- 
chical random graph (HRG) model, with 8 nodes and 4 levels 
(including the leaves), in which the nodes are organized into 
a balanced binary tree and the internal probabilities increase 
as you move from the root toward the leaves. 



the tree, they are connected with probability 

Prid) = 2''+i-^'— . (El) 

The optimal partition of a hierarchical network 

For this simplified HRG model, we now derive an esti- 
mate of the level of the hierarchy whose group structure 
yields the optimal partition. For convenience, we take a 
mean- field approach and consider the average modularity 
(Q) of an ensemble of instances drawn from the model. 
In this case, the modularity function takes the form 



(m) 



2(m) 



(E2) 



Because of the symmetry of the binary tree, the optimal 
partition must consist of groups of the same size. That 
is, to find the maximum modularity partition, we must 
simply find the level d* in the hierarchy that maximizes 



Eq. ( E2 ) . Accounting for the regular way the group struc- 



ture changes with the height d from the bottom of the 
tree, this implies that Eq. (E2) simplifies to 



(Q> = 1 - 



- 2" 



(E3) 



If we treat d as a continuous variable, we find that (Q) 
is maximized when we cut the dendrogram at 



d* — log2(nln2) 



(E4) 



Thus, this balanced and assortative hierarchical net- 
work has a particular behavior with respect to its resolu- 
tion limit j32j . i.e., the resolution limit causes the optimal 



Appendix F: Additional Results for Metabolic 
Networks 



Figure 13 shows the reconstructed modularity func- 
tions for two additional metabolic networks, for the my- 
coplasniatales Mycoplasma pneumoniae and Ureaplasma 
parvum (3 ATCC 700970) [11]. 

Figure [14] shows the corresponding coarsening analy- 
ses (analogous to Fig.js]), which confirms that the behav- 
ior of the T. pallidum described in Section VI also holds 
for these other two networks. That is, for these other 
networks, we also find significant variation in the com- 
position of the largest few identified modules across the 
high-modularity partitions, implying that the degenera- 
cies in the modularity function extend beyond simple re- 
arrangements of the smallest modules. Each inset shows 
the cumulative distribution of the number of groups in 
the sampled partition. Notably, for all three networks, 
the fraction of partitions with /c < 9 groups is not large 
enough to explain the persistence of non-trivial distances 
when all but the largest groups are merged. 

Note: the fraction of the original mean pairwise dis- 
tance VI shown in Fig. [5] and Fig. [T4| is not guaranteed 
to decrease monotonically with k' . To see why, consider 
the pair-wise geographic distances between all the cities 
in California and New York City. The average pairwise 
distance is composed of two parts: the average pairwise 
distance between Californian cities and the average dis- 
tance from each Californian city to New York City. If 
there are very many Californian cities in our calculation, 
the overall pairwise average will tend to be dominated by 
the former term, which has O(n^) elements, rather than 
the latter, which has only 0{n) elements. As we merge 
cities within California, the size of the first term decreases 
and the average distance becomes more representative of 
the distance between California and NYC. 
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FIG. 13: (color online) Reconstructed modularity functions for the metabolic networks of the mycoplasmatales Mycoplasma 
pneumoniae (upper; largest component; n = 354 and m — 856) using 1199 sampled partitions and Ureaplasma parvum (lower; 
largest component; n = 300 and m = 712) using 1199 sampled partitions, each showing a large amount of degeneracy among 
the high-modularity partitions (insets). 




Number of largest modules preserved, k' Number of largest modules preserved, k' 

FIG. 14: For M. pneumoniae (left) and U. parvum (right), the fraction of the mean pairwise variation of information (distance) 
between the sampled high-modularity partitions that remains when all but the k' largest groups in each partition are merged 
into a single group. (Dashed lines indicate one standard deviation; insets show the cumulative distributions of the number 
of partitions with k groups.) Notably, as for the T. pallidum network discussed in the main text (Fig. 5), the distance 
distributions change very little when all but the largest few groups in each partition are combined, indicating that most of the 
distance between partitions is driven by significant differences in the composition of the largest few groups. 
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